Skip to content

[fix]: use valid labels for SP loss normalization#130

Merged
kcz358 merged 2 commits intomainfrom
fix/sp_loss_scale
Jan 16, 2026
Merged

[fix]: use valid labels for SP loss normalization#130
kcz358 merged 2 commits intomainfrom
fix/sp_loss_scale

Conversation

@kcz358
Copy link
Collaborator

@kcz358 kcz358 commented Jan 16, 2026

Replace attention_mask sum with valid tokens (non-ignored labels) count for proper loss normalization in sequence parallel mode.

Motivation

Modifications

Commit Message Convention

Please follow our standardized commit message format:

  • [feat] - New features or functionality
  • [fix] - Bug fixes
  • [docs] - Documentation changes only
  • [style] - Code style changes (formatting, missing semicolons, etc.)
  • [refactor] - Code refactoring without changing functionality
  • [perf] - Performance improvements
  • [test] - Adding or updating tests
  • [chore] - Maintenance tasks, dependency updates, etc.
  • [ci] - CI/CD configuration changes

Examples:

  • [feat] add qwen omni iterable dataset support
  • [fix] resolve bagel model configuration error
  • [docs] update training guide with YAML examples

See CONTRIBUTING.md for more details.

CI/CD Checks

Your PR will automatically run the following checks:

  • Linting: Code formatting with black (line-length=120) and import sorting with isort
  • Run pre-commit run --all-files locally to verify before pushing

Checklist

  • Follow commit message convention (see above)
  • Run pre-commit run --all-files and ensure all checks pass
  • Format your code with black (line-length=120) and isort
  • Add unit tests for new functionality
  • Update documentation as needed, including docstrings or example tutorials
  • Ensure all CI/CD checks pass

Replace attention_mask sum with valid tokens (non-ignored labels)
count for proper loss normalization in sequence parallel mode.
@kcz358 kcz358 merged commit 45c1944 into main Jan 16, 2026
2 checks passed
@kcz358 kcz358 deleted the fix/sp_loss_scale branch January 16, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant